Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.
Changes:
- Introduces
_ToxicChatDatasetremote loader that converts ToxicChat rows intoSeedPrompts and skips prompts that fail template parsing. - Adds unit tests for the ToxicChat loader.
- Updates dataset-loading documentation output to include
toxic_chatin the available dataset list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py |
New HuggingFace-backed loader that builds SeedDataset/SeedPrompt entries from ToxicChat rows and skips problematic entries. |
tests/unit/datasets/test_toxic_chat_dataset.py |
New unit tests validating basic loading behavior and custom HF config plumbing. |
pyrit/datasets/seed_datasets/remote/__init__.py |
Exposes/imports the new loader for discovery/registration. |
doc/code/datasets/1_loading_datasets.ipynb |
Updates the displayed list of available datasets to include toxic_chat. |
Comments suppressed due to low confidence (1)
tests/unit/datasets/test_toxic_chat_dataset.py:78
- For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that
split(and optionallycache) are forwarded to_fetch_from_huggingfacein addition todataset_nameandconfig; otherwise regressions in argument plumbing won't be caught.
with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
dataset = await loader.fetch_dataset()
assert len(dataset.seeds) == 2
mock_fetch.assert_called_once()
call_kwargs = mock_fetch.call_args.kwargs
assert call_kwargs["dataset_name"] == "custom/toxic-chat"
assert call_kwargs["config"] == "custom_config"
cb93f7f to
c63374c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
tests/unit/datasets/test_toxic_chat_dataset.py:78
- This test configures
split="test"but doesn't assert thatfetch_dataset()passessplitthrough to_fetch_from_huggingface(unlike similar tests for other datasets). Adding an assertion forcall_kwargs["split"](and optionallycache) would improve coverage of the custom-config behavior.
loader = _ToxicChatDataset(
dataset_name="custom/toxic-chat",
config="custom_config",
split="test",
)
with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
dataset = await loader.fetch_dataset()
assert len(dataset.seeds) == 2
mock_fetch.assert_called_once()
call_kwargs = mock_fetch.call_args.kwargs
assert call_kwargs["dataset_name"] == "custom/toxic-chat"
assert call_kwargs["config"] == "custom_config"
Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c63374c to
79e2bf9
Compare
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
doc/code/datasets/1_loading_datasets.ipynb:242
- The notebook metadata
language_info.versionchanged from 3.11.14 to 3.13.5, which is typically just an artifact of the author’s local environment. To reduce churn in docs, consider keeping this stable (or stripping environment-specific metadata) when updating notebooks.
"version": "3.13.5"
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Raw wrapping preserves original dataset text including Jinja2-like syntax (HTML, template tags), eliminating the need to catch and skip TemplateSyntaxError entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py:112
- Remote dataset loaders in this repo typically wrap prompt text in Jinja2
{% raw %}...{% endraw %}before constructing aSeedPromptso thatSeedPrompt.__post_init__doesn't render/alter prompt content and so template-like text doesn't trigger parsing errors. Herevalue=item["user_input"]plus catchingTemplateSyntaxErrormeans some prompts will be silently dropped and others may still be mutated by Jinja rendering. Consider wrappinguser_inputin raw tags (and only skipping if it still fails), and include the conv_id in the debug log for traceability.
description=description,
source=source_url,
authors=authors,
groups=groups,
metadata={
"toxicity": str(item.get("toxicity", "")),
"jailbreaking": str(item.get("jailbreaking", "")),
"human_annotation": str(item.get("human_annotation", "")),
},
)
for item in data
]
logger.info(f"Successfully loaded {len(seed_prompts)} prompts from ToxicChat dataset")
return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| async def test_fetch_dataset_preserves_jinja2_content(self): | ||
| """Test that entries with Jinja2-like content are preserved via raw wrapping.""" | ||
| data_with_html = [ | ||
| { |
There was a problem hiding this comment.
The tests don’t exercise the advertised behavior of skipping truly Jinja2-incompatible inputs (e.g., user_input containing {% endraw %} that can break parsing even when wrapped). After adding skip/exception handling in the loader, add a unit test asserting such an entry is skipped while the rest of the dataset still loads.
| seed_prompts = [ | ||
| SeedPrompt( | ||
| value=f"{{% raw %}}{item['user_input']}{{% endraw %}}", | ||
| data_type="text", | ||
| dataset_name=self.dataset_name, |
There was a problem hiding this comment.
fetch_dataset builds SeedPrompt objects in a list comprehension, but SeedPrompt.__post_init__ can raise when Jinja2 cannot parse the template (e.g., if user_input contains {% endraw %} or otherwise breaks the outer {% raw %}...{% endraw %} wrapper). In that case, the entire dataset load will fail rather than “gracefully skipping” bad entries as described. Consider switching to an explicit loop that wraps SeedPrompt(...) construction in a try/except (catch ValueError / Jinja2 TemplateSyntaxError), logs a warning with conv_id, and continues.
…hat-dataset # Conflicts: # doc/code/datasets/1_loading_datasets.ipynb
Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.